skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Mirrokni, Vahab"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available July 13, 2026
  2. In the maximum coverage problem we are given d subsets from a universe [n], and the goal is to output k subsets such that their union covers the largest possible number of distinct items. We present the first algorithm for maximum coverage in the turnstile streaming model, where updates which insert or delete an item from a subset come one-by-one. Notably our algorithm only uses polylogn update time. We also present turnstile streaming algorithms for targeted and general fingerprinting for risk management where the goal is to determine which features pose the greatest re-identification risk in a dataset. As part of our work, we give a result of independent interest: an algorithm to estimate the complement of the pth frequency moment of a vector for p ≥ 2. Empirical evaluation confirms the practicality of our fingerprinting algorithms demonstrating a speedup of up to 210x over prior work. 
    more » « less
    Free, publicly-accessible full text available July 13, 2026
  3. Training with noisy labels often yields suboptimal performance, but retraining a model with its own predicted hard labels (binary 1/0 outputs) has been empirically shown to improve accuracy. This paper provides the first theoretical characterization of this phenomenon. In the setting of linearly separable binary classification with randomly corrupted labels, the authors prove that retraining can indeed improve the population accuracy compared to initial training with noisy labels. Retraining also has practical implications for local label differential privacy (DP), where models are trained with noisy labels. The authors propose consensus-based retraining, where retraining is done selectively on samples for which the predicted label matches the given noisy label. This approach significantly improves DP training accuracy at no additional privacy cost. For example, training ResNet-18 on CIFAR-100 with ε = 3 label DP achieves over 6% accuracy improvement with consensus-based retraining. 
    more » « less
    Free, publicly-accessible full text available May 7, 2026
  4. Agrawal, Shipra; Roth, Aaron (Ed.)
  5. We introduce the ParClusterers Benchmark Suite (PCBS)---a collection of highly scalable parallel graph clustering algorithms and benchmarking tools that streamline comparing different graph clustering algorithms and implementations. The benchmark includes clustering algorithms that target a wide range of modern clustering use cases, including community detection, classification, and dense subgraph mining. The benchmark toolkit makes it easy to run and evaluate multiple instances of different clustering algorithms with respect to both the running time and quality. We evaluate the PCBS algorithms empirically and find that they deliver both the state of the art quality and the running time. In terms of the running time, they are on average over 4x faster than the fastest library we compared to. In terms of quality, the correlation clustering algorithm [Shi et al., VLDB'21] optimizing for the LambdaCC objective, which does not have a direct counterpart in other libraries, delivers the highest quality in the majority of datasets that we used. 
    more » « less
    Free, publicly-accessible full text available November 1, 2025